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ABSTRACT 

Almost  as  soon  as  digital  computers  became  available,  it  was  realized  that  they  could  be 
used  to  process  and  extract  information  from  digitized  images.  Initially,  work  on  digital  image 
analysis  dealt  with  specific  classes  of  images  such  as  text,  photomicrographs,  nuclear  particle 
tracks,  and  aerial  photographs;  but  by  the  1960’s,  general  algorithms  and  paradigms  for 
image  analysis  began  to  be  formulated.  When  the  artificial  intelligence  community  began  to 
work  on  robot  vision,  these  paradigms  were  extended  to  include  recovery  of  three-dimensional 
information,  at  first  from  single  images  of  a  scene,  but  eventually  from  image  sequences 
obtained  by  a  moving  camera;  at  this  stage,  image  analysis  had  become  scene  analysis  or 
computer  vision.  This  paper  reviews  research  on  digital  image  and  scene  analysis  through  the 
1970’s.  This  research  has  led  to  the  formulation  of  many  elegant  mathematical  models  and 
algorithms;  but  practical  progress  has  largely  been  due  to  enormous  increases  in  computer 
power,  allowing  even  “brute  force”  algorithms  to  be  implemented  very  rapidly. 
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1  Computers  and  images 


In  the  1950’s,  digital  computers  began  to  become  available  in  research  laboratories  and 
to  be  used  for  processing  various  types  of  data.  By  the  mid  ’50s,  it  was  realized  that 
computers  could  be  used  to  process  images,  if  the  images  could  first  be  converted  to  digital 
form.  An  image  is  digitized  by  sampling  its  lightness  value  (“gray  level”)  at  a  regularly 
spaced  array  of  points  and  quantizing  it  into  a  discrete  set  of  values.  [We  are  assuming  that 
the  image  is  “black-and-white”;  no  attempt  was  made  to  deal  with  color  images  in  those 
days.]  The  resulting  array  of  numbers,  representing  the  gray  levels  of  individual  picture 
elements  (“pixels”),  can,  if  necessary,  be  input  to  the  computer  using  punched  cards  or  tape 
[45,  233,  234];  but  it  is  much  less  tedious  if  some  direct  means  of  inputting  image  data  to 
the  computer  can  be  provided.  This  can  be  done  by  scanning  the  image,  using  a  drum 
scanner  [141]  or  flying  spot  scanner  [92],  to  yield  a  time- varying  signal;  this  signal  can  then 
be  sampled,  quantized,  and  input  to  the  computer  directly. 

Once  an  image  has  been  input  to  a  computer,  the  computer  can  be  programmed  to  do  a 
great  variety  of  things  to  it.  As  we  shall  see  in  Section  2,  an  image  can  be  processed  to  produce 
other  images,  or  it  can  be  analyzed  to  derive  various  types  of  descriptive  information  about 
it — for  example,  to  classify  it  in  some  way  (“pattern  recognition”).  [Conversely,  descriptive 
information  can  be  used  to  synthesize  images;  image  synthesis  later  became  a  major  area 
of  computer  graphics.  The  conceptual  relationship  among  image  processing,  image  analysis, 
and  image  synthesis  is  summarized  in  the  2-by-2  table  shown  in  Figure  1.]  Note  that  image 
processing  requires  not  only  a  method  of  inputting  images  to  the  computer,  but  also  a  method 
of  outputting  or  displaying  the  processed  images.  Today’s  high-resolution  computer-driven 
displays  were  still  far  in  the  future,  but  there  were  simple  ways  to  generate  hard-copy  images, 
including  the  use  of  overstrike  to  create  halftone-like  “grayscale”  images  on  an  alphanumeric 
printer.  An  example  of  an  image  displayed  in  this  way  is  shown  in  Figure  2;  a  paper  on 
optimization  of  overstrike-based  grayscales  is  [110]. 

Digital  computers  were  by  no  means  the  only  tools  that  could  be  used  to  process  or 
analyze  images.  Many  early  systems  used  analog  circuits  to  process  image-derived  signals, 
or  used  optical  imaging  for  parallel  processing  of  images  represented  in  the  form  of  trans¬ 
parencies.  But  the  ability  of  general-purpose  computers  to  perform  arbitrary  operations  on 
digitized  images  insured  that  digital  methods  would  continue  to  be  used;  and  the  steadily  in¬ 
creasing  speed  of  digital  computers  is  rapidly  overcoming  the  speed  advantages  of  competing 
implementations. 

This  paper  reviews  research  on  digital  image  and  scene  analysis  through  the  1970’s. 
Sections  2-3  describe  some  of  the  main  areas  of  application  of  image  analysis  and  discuss 
why  image  analysis,  unlike  image  processing,  required  the  development  of  new  concepts  and 
techniques.  Section  4  summarizes  early  work  on  basic  image  analysis  techniques,  including 
segmentation,  property  measurement,  and  structural  description.  Section  5  explains  why 
more  powerful  techniques  were  needed  to  describe  images  of  three-dimensional  scenes.  Sec¬ 
tion  6  lists  milestone  conferences,  books,  and  journals  dealing  with  the  field  as  a  whole  or 
with  specific  application  areas. 
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2  Processing  and  analysis 

Initially,  image  processing  had  an  important  advantage  over  image  analysis:  There  already 
existed  a  well- developed  theory  of  signal  processing  that  generalized  more  or  less  straight¬ 
forwardly  to  two-dimensional  signals  (i.e.,  to  images),  and  that  applied  to  digital  as  well 
as  analog  signals.  The  increasing  importance  of  television  and  facsimile  led  to  a  growing 
interest  in  image  communication,  with  particular  emphasis  on  image  bandwidth  compres¬ 
sion  or  coding.  Advances  in  image  reproduction  (photography,  xerography,  etc.)  and  display 
led  to  the  development  of  methods  of  measuring  image  quality  and  to  research  on  image 
enhancement  (quality  improvement  by  contrast  stretching,  deblurring,  noise  reduction,  etc.) 
and  restoration  (estimation  and  correction  of  image  degradations);  such  techniques  could 
also  be  used  for  “preprocessing”  images  as  a  preliminary  to  describing  or  classifying  them. 
A  more  recent  branch  of  image  processing  is  the  reconstruction  of  cross-section  images  from 
projections,  as  in  computed  x-ray  tomography. 

For  image  analysis,  on  the  other  hand,  there  was  little  or  no  preexisting  theory.  Basic 
signal  analysis  techniques  such  as  Fourier  analysis  or  matched  filtering  can  be  applied  to 
images;  but  practical  image  analysis  applications,  such  as  those  described  in  Section  3, 
almost  always  require  more  powerful  techniques.  (The  patterns  of  interest  for  describing 
images  are  usually  not  sinusoids,  and  often  cannot  be  detected  by  exact  matching.)  Thus 
progress  on  these  applications  was  accompanied  by  the  invention  of  many  of  today’s  basic 
methods  of  image  analysis,  as  described  in  Section  4. 

3  Motives:  Image  analysis  applications 

General  theories  of  image  analysis  were  slow  to  emerge  because  image  analysis  systems  were 
developed  to  deal  with  specific  classes  of  images  and  to  derive  domain-specific  descriptions  of 
these  images.  Some  of  the  major  areas  of  application  of  image  analysis  are  briefly  discussed 
in  the  following  paragraphs;  references  to  early  work  on  these  areas  will  be  given  in  Section  6. 

a)  Character  recognition.  Computers  deal  extensively  with  alphanumeric  information 
which  is  conventionally  input  by  hand  using  a  keyboard.  This  information  often  already 
exists  in  human-readable  hard-copy  form;  if  the  computer  could  reliably  recognize  the 
characters  (letters,  numbers,  etc.)  in  a  digitized  image  of  the  hard  copy,  keyboard 
input  could  be  eliminated.  Thus  some  of  the  earliest  work  on  image  analysis  was 
aimed  at  developing  effective  methods  of  character  recognition.  [The  task  was  called 
optical  character  recognition  (OCR)  not  because  it  was  implemented  optically,  but  to 
distinguish  it  from  magnetic  ink  character  recognition  (MICR),  in  which  the  characters 
were  printed  in  magnetic  ink  to  make  them  easily  detectable,  and  were  given  special 
shapes  to  make  them  easily  recognizable.]  Character  recognition  problems  vary  widely 
in  difficulty;  cleanly  machine-printed  characters,  especially  if  they  are  in  a  known  font, 
are  relatively  easy  to  recognize,  but  hand-printed  characters  still  pose  problems,  and 
the  recognition  of  cursive  script  is  still  an  active  research  area. 

b)  Microscopy.  Optical  and  electron  microscope  images  are  used  for  many  purposes  in 
such  fields  as  materials  science,  biology,  and  medicine.  The  medical  applications,  in 
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particular,  involve  the  routine  examination  of  hundreds  of  millions  of  images  a  year  by 
pathologists,  hematologists,  and  geneticists,  for  such  purposes  as  chromosome  mapping, 
blood  cell  counting,  and  pap  smear  analysis.  Not  surprisingly,  attempts  to  automate 
some  of  these  tasks  began  as  early  as  the  1950’s,  and  work  in  this  area  is  still  ongoing. 

c)  Radiology.  The  development  of  three-dimensional  medical  imaging  techniques  (com¬ 
puted  tomography,  magnetic  resonance  imaging,  and  so  on)  over  the  past  few  decades 
is  a  major  accomplishment  of  computer  image  processing,  and  provides  important  new 
sources  of  images  for  analysis.  Long  before  such  images  became  available,  hundreds  of 
millions  of  conventional  x-ray  images  a  year  were  being  examined  by  radiologists  (as 
well  as  by  dentists,  engineers,  and  many  others);  but  relatively  little  work  seems  to 
have  been  done  prior  to  the  1970’s  on  the  automation  of  radiographic  image  analysis. 
The  analysis  of  both  conventional  radiographs  and  three-dimensional  medical  images 
continues  to  be  an  active  area  of  research. 

d)  Remote  sensing.  Images  are  the  primary  source  of  information  about  distant  objects; 
thus  image  analysis  plays  a  central  role  in  astronomy.  For  over  a  century,  aerial  images 
of  the  earth’s  surface  have  provided  “bird’s-eye  views”  from  which  valuable  information 
can  be  derived  about  agriculture,  natural  resources,  hydrology,  geology,  geography, 
and  cartography,  and  for  use  in  military  reconnaissance  or  environmental  monitoring. 
As  early  as  the  1950’s,  research  had  begun  on  the  automatic  recognition  of  cultural 
features  (buildings,  roads,  bridges,  ...)  in  aerial  photographs.  Since  the  1960’s,  larger- 
scale  views  of  the  earth  and  the  atmosphere  have  been  provided  by  satellite  images. 
Huge  numbers  of  such  images  have  been  acquired,  but  only  a  small  fraction  of  them 
have  been  examined  in  any  detail;  thus  there  is  a  continuing  need  to  develop  automatic 
techniques  for  extracting  and  analyzing  the  information  that  these  images  contain. 

e)  Other  areas.  Images  are  collected  and  examined  in  many  other  areas  of  science  and 
engineering;  thus  computer  image  analysis  has  many  other  potential  areas  of  applica¬ 
tion.  An  example  of  such  an  area  is  forensic  science;  a  classical  task  in  this  area  is  the 
recognition  of  fingerprints,  and  a  more  difficult  task  is  face  recognition.  Another  area, 
in  which  there  was  considerable  activity  a  few  decades  ago,  is  high  energy  physics.  The 
study  of  subatomic  particles  was  greatly  advanced,  around  the  middle  of  this  century, 
by  the  development  of  devices  such  as  the  cloud  chamber  and  the  bubble  chamber,  in 
which  the  particles  left  visible  “tracks”.  The  search  for  new  particles,  and  the  mea¬ 
surement  of  their  properties,  involved  the  analysis  of  thousands  of  images,  with  the 
goal  of  detecting  the  tracks  of  rare  particles,  measuring  their  geometric  properties,  and 
detecting  “events”  (abrupt  changes)  representing  particle  interactions. 

The  types  of  images  that  need  to  be  analyzed  in  these  widely  varying  domains  are  very 
different,  but  the  types  of  analyses  that  need  to  be  performed  on  these  images  have  many 
things  in  common.  Image  analysis  almost  always  involves  a  few  basic  processes:  distinguish¬ 
ing  certain  parts  of  the  image  (representing  characters,  blood  cells,  tumors,  wheat  fields, 
particle  tracks,  fingerprint  ridges,  facial  features,  ...);  measuring  properties  of  these  parts, 
or  relations  among  the  parts;  and  using  the  values  of  these  properties  or  relations  to  clas¬ 
sify  or  describe  the  parts,  or  to  describe  or  classify  the  image  as  a  configuration  of  parts. 
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These  processes  define  the  general  image  analysis /recognition  paradigm  shown  in  Figure  3. 
In  Section  4  we  will  describe  some  of  the  methods  that  were  developed  to  implement  these 
processes. 

4  Methods:  Image  analysis  algorithms 
4.1  Segmentation 

As  pointed  out  in  the  last  paragraph,  the  images  that  we  are  usually  interested  in  analyzing 
contain  parts  that  represent  visibly  different  entities,  and  the  desired  descriptions  of  the 
images  involve  these  parts.  Thus  segmentation  of  an  image  into  “meaningful”  parts  is  almost 
always  the  first  step  in  any  image  analysis  process. 

In  some  simple  situations,  the  entities  differ  in  lightness;  thus  the  pixels  belonging  to 
image  parts  that  represent  different  entities  have  different  ranges  of  gray  levels.  If  these 
ranges  are  more  or  less  disjoint,  the  image  can  be  segmented  into  parts  by  thresholding  the 
pixel  gray  levels — i.e.,  comparing  the  gray  levels  to  some  reference  value(s),  and  assigning 
them  to  classes  depending  on  which  range  they  lie  in.  The  character  recognition  domain  is 
perhaps  the  most  obvious  example  of  this  situation:  characters  are  usually  much  darker  than 
the  paper  on  which  they  are  printed  or  written.  Thresholding  can  also  provide  meaningful 
segmentations  for  more  complex  types  of  images;  for  example,  in  (suitably  stained)  micro¬ 
scope  images  of  cells,  the  nuclei  of  the  cells  are  generally  darker  than  the  cell  bodies,  which 
in  turn  are  darker  than  the  background.  The  suitability  of  thresholding  for  segmenting  an 
image  can  be  determined  by  examining  the  population  of  pixel  gray  levels  in  the  image;  this 
can  be  done  by  constructing  a  bar  graph  (a  histogram )  in  which  each  bar  corresponds  to  a 
gray  level,  and  its  height  indicates  the  number  of  pixels  having  that  gray  level.  Peaks  in 
this  histogram,  separated  by  valleys,  represent  subpopulations  of  pixels  that  have  distinctive 
ranges  of  gray  levels;  evidently,  a  threshold  corresponding  to  the  bottom  of  a  valley  between 
two  peaks  will  well  separate  the  subpopulations  corresponding  to  the  peaks  [201].  For  a 
survey  of  thresholding  methods  see  [261].  More  general  methods  of  segmentation  by  peak 
(i.e.,  cluster)  detection  in  color  space  or  local  property  value  space  are  described  in  [188,  40]. 

If  the  gray  level  ranges  of  the  image  parts  overlap,  or  vary  from  place  to  place  in  the  image 
because  of  “shading”  (due,  for  example,  to  slowly  varying  illumination),  global  thresholding 
becomes  relatively  useless  as  a  method  of  segmentation,  though  local  thresholding  can  still 
be  used  if  the  gray  levels  of  neighboring  parts  are  disjoint  [33].  More  generally,  image  parts 
are  often  distinguishable  because  there  are  abrupt  jumps  in  gray  level  at  their  boundaries. 
Such  jumps  (edges)  can  be  detected  by  examining  the  image  gray  levels  in  the  neighborhood 
of  each  pixel  and  checking  for  large  differences.  The  highest  directional  rate  of  change  of 
a  function  /  in  the  neighborhood  of  a  point  is  called  the  gradient  of  /;  the  usefulness  of 
the  gradient  for  edge  detection  was  pointed  out  as  early  as  the  1950’s  [147].  (The  same 
paper  discussed  the  Laplacian  and  its  use  for  approximate  inversion  of  diffusion  blur.)  The 
magnitude  and  direction  of  the  gradient  of  /  can  be  computed  from  the  values  of  the  partial 
derivatives  of  /  in  two  directions;  in  digital  images,  analogously,  one  can  use  first  differences 
instead  of  derivatives.  An  early  example,  using  differences  between  adjacent  pixels  in  the  two 
diagonal  directions,  can  be  found  in  [206].  An  important  early  discussion  of  edge  detection, 
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which  unfortunately  appeared  only  in  report  form,  is  [113];  it  discusses  linear  and  nonlinear 
operations  for  the  detection  of  both  “step”  and  “roof”  edges.  A  surface  fitting  approach  to 
gradient  estimation  is  described  in  [200].  Other  early  approaches  to  step  edge  detection  can 
be  found  in  [124]  (using  best-fitting  step  functions)  and  [229]  (using  differences  of  average 
gray  levels  in  neighborhoods  of  many  sizes,  and  selecting  a  “best”  size  at  each  point).  (In 
[139],  edge  detection  applied  to  a  reduced- resolution  image  is  used  to  guide  the  search  for 
edges  in  a  full- resolution  image.)  For  a  statistical  treatment  of  edge  detection  see  [91];  on 
evaluation  of  edge  detection  algorithms  see  [1];  and  for  a  survey  of  edge  detection  techniques 
see  [42]. 

Edges  are  the  most  common  locally  detectable  image  features;  other  types  are  spots, 
curve  ends,  curves  (including  straight  lines),  and  corners.  The  conventional  way  of  detecting 
such  features  is  to  use  higher-order  difference  operators,  but  such  operators  are  not  pattern- 
specific;  for  example,  the  second  difference  in  the  x  direction  may  be  higher  for  a  high- 
contrast  vertical  edge  than  it  is  for  a  low-contrast  vertical  line.  A  better  approach  [229] 
is  to  use  operators  that  incorporate  logical  conditions — for  example,  a  bright  vertical  line 
is  present  in  the  neighborhood  of  pixel  P  only  if  P  has  a  higher  gray  level  than  both  its 
horizontal  neighbors,  and  the  same  is  true  for  both  of  P’s  vertical  neighbors.  Methods  of 
feature  detection  using  basis  functions  are  described  in  [125,  68,  127];  a  general  class  of  local 
operators  is  defined  in  [84].  (The  “morphological”  operations  that  became  popular  in  later 
years  originated  quite  early;  on  the  two- valued  case  see  [141]  and  [174],  and  on  the  grayscale 
case  see  [180].)  On  digital  arcs  and  curves  see  [213];  on  digital  straight  lines  see  [215]. 

Local  operators  can  detect  features  in  the  neighborhoods  of  individual  pixels,  but  cannot 
link  the  local  detections  that  correspond  to  an  entire  curve  or  an  entire  region  boundary. 
Such  global  features  can  be  extracted  by  a  search  process  that  finds  sets  of  feature  pixels 
that  are  optimal  with  respect  to,  e.g.,  both  gray  level  contrast  and  geometric  smoothness 
[173,  163,  164].  If  a  global  feature  has  a  simple  geometric  shape — for  example,  if  it  is 
straight — it  can  be  converted  to  a  local  feature  (a  “spot”  or  cluster)  by  mapping  the  image 
into  a  suitably  defined  parameter  space;  for  example,  collinear  feature  points  all  map  into 
the  same  point  in  a  line  parameter  space  [121,  47]. 

Image  parts  of  known  shapes  can  also  be  detected  by  “template  matching”.  Matching 
methods  are  also  studied  in  image  processing;  but  many  such  methods,  particularly  those 
involving  inexact  matching,  were  developed  in  an  image  analysis  context  [7,  10,  59,  6,  14, 
256,  230,  273,  274,  202], 

An  image  can  also  be  segmented  into  connected  components:  maximal  regions  in  which 
neighboring  pixels  have  (almost)  the  same  gray  level  [176],  or  more  generally,  into  regions 
that  are  good  fits  to  simple  functions  [191];  pairs  of  adjacent  regions  can  then  be  merged 
if  they  are  not  separated  by  strong  edges  or  their  union  is  still  simple,  or  more  generally,  if 
merging  them  results  in  a  “better”  partition  of  the  image  [22,  55,  93,  276,  119,  247,  120,  30]. 
For  a  survey  of  region-based  segmentation  techniques  see  [280]. 

Multiscale  methods  of  image  processing  and  analysis,  as  used  in  feature  detection,  seg¬ 
mentation,  matching,  etc.,  are  conveniently  implemented  if  a  multiscale  (“pyramid”)  image 
representation  is  used  [246].  On  the  related  idea  of  variable-scale  (“quadtree”)  image  repre¬ 
sentation  see  [142]. 

Methods  of  feature  detection  and  segmentation  are  reviewed  in  [210],  Ch.  8  and  in  [208, 
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152,  200,  205,  101,  223].  (It  should  be  pointed  out  that  in  pattern  recognition,  any  property 
of  a  pattern  that  is  used  for  classification  purposes  is  called  a  “feature”;  but  in  image  analysis, 
the  term  “feature”  refers  to  a  locally  detectable  pattern  in  an  image.)  On  interpretation- 
guided  segmentation  see  [248];  on  convergent  evidence  in  segmentation  see  [169].  Iterative 
methods  of  image  segmentation,  or  more  generally  of  labelling  image  parts,  are  discussed  in 
[224,218]. 

4.2  Properties 

A  wide  variety  of  properties  of  an  image  or  of  its  parts  can  be  defined.  The  lightness 
(or  darkness)  and  “contrastiness”  of  an  image  are  described  by  the  mean  and  standard 
deviation  of  the  pixel  gray  levels  in  the  image;  these  statistics  can  be  computed  from  the 
image’s  histogram.  An  image  can  be  normalized  with  respect  to  linear  transformations  of  its 
grayscale  by  shifting  and  scaling  its  gray  levels  so  as  to  standardize  the  values  of  their  mean 
and  standard  deviation. 

Textural  properties  of  an  image  can  be  described  using  statistics  of  higher-order  gray  level 
distributions  of  its  pixels;  for  example,  for  any  given  spatial  displacement  8,  a  second-order 
distribution  is  defined  by  the  numbers  of  pairs  of  pixels  at  separation  8  that  have  given 
pairs  of  gray  levels  [131,  104].  Alternatively,  textural  properties  can  be  described  using  first- 
order  statistics  of  the  values  of  local  properties  measured  at  every  point  of  the  image  [207]; 
second-order  statistics  of  local  property  values  can  also  be  used  [43].  Still  another  approach 
is  to  segment  the  image  into  microregions  (“texels”)  and  use  statistics  of  properties  of  these 
regions  [156].  Reviews  of  methods  of  texture  analysis  can  be  found  in  [109,  103].  On  the 
statistical  analysis  of  spatial  data  see  [15],  and  on  fractal  models  see  [157].  An  early  meeting 
on  texture  analysis  was  [300],  and  a  workshop  on  (statistical)  image  modeling  was  [222]. 

Any  linear  property  of  an  image  is  a  weighted  sum  of  its  pixel  values  ([210],  Ch.  7). 
Moments  are  an  important  class  of  linear  properties  in  which  the  weights  are  monomials  of 
the  form  xlyi  [122,  123,  80,  5].  An  image  can  be  normalized  with  respect  to  translation, 
rotation,  and  scale  by  shifting,  rotating,  and  rescaling  it  so  as  to  standardize  the  values  of 
its  first-  and  second-order  moments  (1  <  i  +  j  <  2).  But  many  important  image  properties 
cannot  be  expressed  as  linear  combinations  of  local  properties  [172]. 

Image  parts  can  also  be  described  by  a  wide  variety  of  geometric  properties,  and  can 
be  decomposed  into  subparts  based  on  geometric  criteria.  For  example,  an  image  part  can 
be  decomposed  into  its  connected  components  (maximal  connected  subsets)  [211,  216].  A 
connected  image  part  S  whose  complement  S  is  also  connected  is  called  simply  connected ;  if 
S  is  not  connected,  the  components  of  S  that  are  surrounded  by  S  are  called  holes  in  S. 

The  set  of  pixels  of  S  that  have  neighbors  in  a  given  component  of  S  is  called  a  border  B  of 
S.  Starting  at  any  pixel  P  of  a  border  B ,  an  algorithm  can  be  defined  [227]  that  successively 
visits  all  the  pixels  of  B  and  returns  to  P.  The  succession  of  moves  from  neighbor  to  neighbor 
made  by  this  algorithm  define  the  chain  code  of  B  [62,  64,  65,  66].  The  smoothed  direction 
of  the  border  defines  its  slope ,  and  the  rate  of  change  of  this  direction  defines  its  curvature. 
A  border  can  be  decomposed  into  a  succession  of  convex  and  concave  parts  in  which  its 
curvature  is  positive  or  negative  [63].  The  shapes  of  borders  (i.e.,  of  closed  curves)  can  be 
analyzed  in  many  ways;  for  examples  see  [12,  203,  279,  225,  195].  (Fourier  analysis  of  the 
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shape  of  a  border  seems  to  have  been  first  used  in  1950’s  reports  by  Joseph  on  the  AN/GSQ- 
14  Electrophotographic  Viewer.)  A  book  on  shape  description  is  [21];  a  survey  of  shape 
analysis  algorithms  is  [193]. 

Another  algorithm  can  be  defined  that  computes  the  intrinsic  distance  [228],  i.e.  the 
shortest  number  of  moves  within  S,  from  each  pixel  of  S  to  the  nearest  pixel  of  5;  the  result 
of  this  computation  is  called  the  distance  transform  of  S  [227].  The  set  of  local  maxima 
of  the  distance  transform  is  called  the  medial  axis  of  S  [19,  175].  A  connected  S  is  called 
elongated  if  its  area  is  much  greater  than  the  maximum  value  of  its  distance  transform. 
Elongated  parts  of  S  can  be  extracted  by  shrinking  S  (deleting  pixels  with  low  distance 
values)  and  then  reexpanding  it  (i.e.,  shrinking  its  complement)  by  the  same  amount;  the 
resulting  S'  must  be  a  subset  of  S,  and  sufficiently  large  connected  components  of  S  —  S' 
must  be  elongated  parts  of  S  (which  can  be  of  arbitrary  thickness,  depending  on  how  much 
shrinking  and  reexpanding  is  needed  to  detect  them).  Similarly,  expanding  and  reshrinking 
S  gives  a  superset  S"  of  S ,  and  large  connected  components  of  S"  —  S  must  have  arisen 
from  clusters  of  parts  of  S  [174,  210].  Algorithms  can  be  defined  for  thinning  an  elongated  S 
into  a  connected  skeleton  (e.g.,  [114]).  The  skeleton  can  be  decomposed  into  branches  that 
correspond  to  protrusions  of  S  [20]. 

S  is  called  convex  [240]  if  any  line  segment  joining  two  pixels  of  S  lies  entirely  in  S',  any 
such  S  must  be  simply  connected.  The  smallest  convex  set  S  containing  S  is  called  the 
convex  hull  of  S,  and  the  connected  components  of  S  —  S  are  called  the  concavities  of  S. 

Image  parts  are  often  difficult  to  define  precisely;  it  may  be  advantageous  to  regard  them 
as  fuzzy  subsets  of  the  image  [278,  200].  Definitions  of  image  part  properties  can  often  be 
generalized  straightforwardly  to  fuzzy  image  parts  [278,  24,  219]. 

4.3  Classification  and  description 

An  image  or  image  part  can  be  classified  on  the  basis  of  the  values  of  its  properties.  The 
process  of  classifying  a  pattern  (e.g.,  an  image)  based  on  property  values  is  studied  in  the 
field  of  pattern  recognition.  If  the  probability  distribution  of  the  values  of  the  properties 
is  known  for  each  class,  and  the  a  priori  probabilities  of  the  classes  are  also  known,  Bayes’ 
theorem  can  be  used  to  compute  the  probability  that  a  given  pattern  (having  an  observed 
set  of  property  values)  belongs  to  each  class.  This  statistical  pattern  recognition  paradigm  is 
not  specific  to  patterns  derived  from  images;  it  will  not  be  discussed  further  here. 

An  image  can  be  described  as  consisting  of  parts  that  have  given  properties  and  that  are 
related  to  one  another  in  various  ways;  this  is  the  structural  pattern  recognition  paradigm 
[192].  Relationships  that  may  be  of  interest  for  descriptive  purposes  include  relative  values 
of  properties  (darker  than,  larger  than,  ...);  set-theoretic  and  topological  relationships  (con¬ 
tained  in,  intersecting,  adjacent  to,  surrounded  by,  ...);  and  relationships  of  relative  position 
(near/far,  above/below/right /left,  between;  on  an  early  attempt  to  define  such  relationships 
see  [268]).  A  structural  description  of  an  image  in  terms  of  parts,  properties,  and  relations 
can  be  represented  by  a  labeled  graph  in  which  the  nodes  correspond  to  the  parts;  each 
node  is  labelled  with  its  property  values;  and  the  arcs  represent  relations  between  the  parts, 
labeled  with  their  values.  For  an  early  discussion  of  such  relational  descriptions  see  [12]. 

Describing  an  image  as  a  configuration  of  parts,  which  may  in  turn  be  composed  of 
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subparts,  etc.,  is  analogous  to  “parsing”  a  sentence  into  clauses,  phrases,  etc.  [171].  This 
idea  led  to  a  number  of  attempts  to  define  picture  grammars  that  could  be  used  to  parse  (or 
generate)  classes  of  pictorial  patterns,  or  (labelled)  graph  grammars  for  classes  of  relational 
descriptions.  For  some  early  examples  of  such  approaches  see  [140,  181,  182,  170,  54,  37, 
36,  183,  235,  196].  There  were  many  additional  papers  on  this  subject  in  the  1970’s;  we 
mention  here  only  a  few  major  examples  of  work  related  to  this  area  [79,  87,  88,  89].  An 
early  conference  on  the  subject  was  [289];  a  book  on  the  subject  is  [221].  (The  still  ongoing 
series  of  International  Workshops  on  Graph  Grammars  began  in  1978  [35];  for  a  book  on  the 
subject  see  [179].)  Picture  languages  can  also  be  characterized  as  the  classes  of  pictures  that 
are  accepted  by  various  types  of  two-dimensional  automata  (e.g.,  see  [136]).  These  concepts 
led  to  extensive  work  on  syntactic  pattern  recognition  starting  in  the  1970’s;  basic  references 
on  this  subject  are  [71,  72,  73,  82]  (see  also  [194]). 

Many  other  image  analysis  techniques  were  studied  during  the  1960’s  and  70’s;  the  au¬ 
thor’s  first  ten  bibliographies  on  image  processing  and  analysis  [209,  212,  214]  contain  nearly 
5000  references.  This  review  is  restricted  to  highlights  and  milestones;  it  does  not  claim  to 
be  comprehensive. 

5  Computer  vision:  scene  analysis 

In  several  of  the  areas  of  application  of  image  analysis  described  in  Section  3,  the  “objects” 
that  appear  in  the  images  are  essentially  two-dimensional;  this  is  obvious  in  the  case  of 
character  recognition,  where  the  objects  are  marks  on  a  flat  paper  surface,  and  it  is  also  true 
in  microscopy,  where  because  of  the  extremely  shallow  depth  of  field  of  a  microscope,  the 
image  shows  a  flat  “slice”  (an  “optical  section”)  of  the  object.  In  other  areas,  the  objects 
are  three-dimensional,  but  they  are  viewed  from  a  known  direction,  so  that  the  images 
show  known  projections  of  the  objects;  this  is  the  case  in  conventional  radiology  and  in 
downward-looking  remote  sensor  imagery.  [In  the  remote  sensing  case,  the  images  are  also 
taken  from  a  great  distance,  so  terrain  relief  can  sometimes  be  regarded  as  negligible.]  These 
considerations  allow  us  to  regard  a  two-dimensional  image  as  an  adequate  representation  of 
the  scene  containing  the  objects,  and  to  analyze  the  scene  using  the  image  analysis  paradigm 
of  Figure  3. 

When  a  scene  is  imaged  from  an  unknown  viewpoint  which  is  not  very  distant  relative 
to  the  sizes  of  the  objects  in  the  scene,  the  image  can  no  longer  be  regarded  as  an  adequate 
representation  of  the  scene,  since  it  no  longer  shows  a  known  projection  of  the  objects,  and 
objects  may  also  (partially)  occlude  one  another.  In  such  situations,  the  paradigm  of  Figure  3 
is  evidently  inadequate.  This  became  apparent  when  the  artificial  intelligence  laboratories  at 
such  institutions  as  MIT,  Stanford,  and  SRI  were  established  and  began  to  work  on  problems 
of  robot  vision;  a  robot  must  manipulate,  or  navigate  among,  objects  that  are  close  to  it  and 
lie  in  arbitrary  directions  relative  to  it. 

Today’s  computer  vision  systems  deal  with  sequences  of  images  of  dynamic  scenes,  con¬ 
taining  moving  objects  and  obtained  by  moving  cameras.  The  images  may  be  obtained  by 
multiple  cameras  (allowing  scene  depth  to  be  determined  by  measuring  the  positions  of  the 
images  of  a  scene  point  in  images  obtained  by  different  cameras),  or  may  be  obtained  by 
range  sensors  that  allow  the  depths  of  scene  points  to  be  measured  directly);  but  for  the  mo- 
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ment  we  shall  continue  to  consider  only  single,  conventional  images.  Even  in  this  restricted 
situation,  a  more  complex  paradigm  is  needed  to  describe  the  process  of  inferring  a  descrip¬ 
tion  of  a  three-dimensional  scene  from  an  image.  Such  a  paradigm  is  shown  schematically 
in  Figure  4. 

The  scene  analysis  paradigm  of  Figure  4  differs  from  the  image  analysis  paradigm  of 
Figure  3  in  two  principal  ways: 

a)  It  incorporates  processes  that  recover  information  about  relative  depth  from  the  image, 
or  from  sets  of  local  features  detected  in  the  image.  If  simple  assumptions  about  illumi¬ 
nation  and  surface  reflectivity  are  satisfied,  the  orientation  of  a  homogeneous  surface 
can  be  inferred  from  gray  level  variations  across  its  image  (“shape  from  shading”). 
Similarly,  the  orientation  of  a  homogeneously  textured  surface  can  be  inferred  from 
variations  in  the  spacings  of  local  features  in  its  image  (“shape  from  texture”).  The 
occlusion  of  one  surface  by  another  can  be  inferred  from  the  shapes  of  the  junctions 
at  which  region  boundaries  meet  (“shape  from  contour”);  for  example,  the  presence  of 
a  T-junction  suggests  that  the  surface  giving  rise  to  the  region  above  the  horizontal 
line  of  the  T  is  in  front  of  the  surfaces  giving  rise  to  the  regions  below  it,  because  the 
boundary  between  the  latter  two  regions  (the  leg  of  the  T)  appears  to  be  occluded. 
Note  that  this  “recovered”  information  is  viewpoint-dependent — i.e.,  it  describes  depth 
relative  to  the  observer  (the  camera).  Such  information  is  called  2- 1/2- dimensional  be¬ 
cause  it  relates  only  to  the  parts  of  the  scene  that  are  visible  to  the  observer,  and  not 
to  the  hidden  parts  of  the  scene  (occluded  objects;  backs  of  objects). 

b)  If  2-1/2-dimensional  information  is  available,  the  image  can  be  segmented  into  regions 
that  correspond  to  connected  visible  surface  patches  in  the  scene.  The  shapes  of  these 
image  regions  are  strongly  dependent  on  viewpoint,  because  of  both  occlusion  and 
perspective  distortion;  thus  the  identities  of  the  objects  that  appear  in  the  scene,  and 
their  spatial  layout,  cannot  be  directly  derived  from  the  properties  of  and  relationships 
among  the  image  regions.  However,  if  the  set  of  possible  objects  is  limited  (and  known), 
it  is  possible  in  principle  to  determine  which  of  these  objects,  seen  from  which  view¬ 
point,  could  have  given  rise  to  the  observed  set  of  image  regions;  this  “back-projection” 
process  allows  us  to  infer  the  identities  and  “poses”  of  the  visible  objects,  as  well  as 
the  layout  of  these  objects  in  the  scene. 

Early  research  on  three-dimensional  scene  analysis  dealt  with  scenes  containing  simple 
geometrical  objects  (the  “blocks  world”).  The  first  Ph.D.  dissertation  on  this  subject,  at  the 
MIT  Artificial  Intelligence  Laboratory,  was  that  of  Roberts  (see  [206].)  (For  early  papers  on 
robot  vision  at  MIT,  Stanford,  Edinburgh,  and  SRI  see  [166,  61,  197,  57,  269,  11,  53].)  The 
knowledge  that  a  scene  consists  of  polyhedral  objects  can  be  used  as  a  guide  in  segmenting 
images  of  the  scene  [90,  236,  237].  In  the  mid-60’s  an  attempt  was  made  at  MIT  to  develop 
a  system  that  could  recognize  many  different  common  objects  (hand  tools)  [190],  but  this 
effort  was  not  successful. 

Research  on  recovery  techniques  was  also  initiated  in  the  1960’s.  The  first  Ph.D.  dis¬ 
sertation  on  shape  from  shading,  also  at  MIT,  was  that  of  Horn  in  1970  (see  [117];  for  his 
later  work  on  the  subject  see  [118]).  The  concept  of  shape  from  texture  was  introduced 
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in  a  1950  book  on  visual  perception  by  Gibson  [77];  early  attempts  at  implementing  this 
concept  were  not  entirely  successful  [25],  but  better  results  were  achieved  in  the  1970’s  [8]. 
The  first  Ph.D.  dissertation  on  shape  from  contour,  developing  rules  for  identifying  regions 
that  (probably)  belong  to  different  polyhedral  objects,  was  that  of  Guzman  at  MIT  [94]; 
a  theoretical  basis  for  Guzman’s  rules  was  developed  in  the  early  1970’s  by  Huffman  [126] 
and  by  Clowes  [38]  (see  also  [155]),  and  was  extended  by  Waltz  to  allow  for  contours  due  to 
cracks  or  shadows  [258].  Guzman  also  formulated  rules  about  contours  of  curvilinear  objects 
[95];  for  systematic  treatments  of  such  rules  see  [28,  244].  Another  approach  to  shape  from 
contour,  involving  the  inference  of  surface  orientation  from  contour  shape  (or  spacing,  as 
in  shape  from  texture)  was  investigated  by  Stevens  [244].  A  general  treatment  of  recovery 
techniques  is  given  in  [13]. 

During  the  1970’s,  Marr  at  MIT  formulated  a  paradigm  for  visual  information  repre¬ 
sentation  that  involved  2D  feature  analysis  (the  “primal  sketch”),  relative  depth  recovery 
(the  “2-1/2-D  sketch”),  and  object  recognition  using  3D  object  models  [159,  160,  161].  For 
collections  of  1970’s  MIT  papers  on  scene  analysis  see  [270,  272].  Another  approach  to 
model-based  object  recognition  is  described  in  [23];  a  system  for  analyzing  images  of  natural 
scenes  is  described  in  [102]. 

Quantitative  depth  information  about  a  scene  can  be  obtained  by  stereo  triangulation 
or  by  direct  range  sensing.  Early  work  on  automation  of  stereomapping  is  reviewed  in 
[50];  for  robot  vision  purposes,  more  powerful  methods  were  needed  to  achieve  short-range 
stereopsis  [99].  Important  models  for  human  stereopsis  also  appeared  during  the  1970’s  (e.g., 
[132,  162]),  and  “photometric  stereo”,  in  which  surface  orientation  information  is  derived  by 
comparing  the  shading  in  two  or  more  images  taken  from  the  same  camera  position  under 
different  illuminations,  was  introduced  in  1979  [275].  Patterned  illumination  became  widely 
used  for  range  imaging  (e.g.,  [266,  267]).  An  early  study  of  range  image  analysis  was  [186]; 
applications  to  the  description  and  recognition  of  objects  composed  of  “generalized  cylinders” 
are  described  in  [4,  185].  The  first  meeting  on  representation  of  three-dimensional  objects 
was  held  in  1979  [308]. 

The  first  Ph.D.  dissertation  on  visual  motion  analysis  based  on  feature  point  correspon¬ 
dences  was  that  of  Ullman  at  MIT  in  1977  [251].  A  1979  book  on  visual  perception  by  Gibson 
[78]  (see  also  his  1950  book  [77])  emphasized  the  importance  of  “optical  flow”  in  a  moving 
observer’s  perception  of  its  environment.  An  early  example  of  dynamic  scene  analysis  using 
optical  flow  is  [56];  on  motion-based  image  segmentation  see  [128],  [129].  Koenderink  and 
Van  Doom  extensively  studied  the  structure  of  the  parallax  field  generated  by  motion  rela¬ 
tive  to  a  solid  body  (e.g.,  [144,  145]).  Two  surveys  of  early  work  on  image  sequence  analysis 
appeared  in  1978  [165,  178],  and  the  first  workshop  on  the  analysis  of  time- varying  imagery 
was  held  in  1979  [2]. 

6  Milestones 

Early  papers  on  image  analysis  appeared  in  general  electrical  engineering  or  computing  con¬ 
ferences.  By  the  1960’s,  specialized  paper  collections  and  workshop  proceedings  on  pattern 
recognition  began  to  appear  [134,  259,  283,  284,  138,  70,  260,  293,  49,  295,  29,  32],  leading 
to  the  initiation  of  the  International  Conferences  on  Pattern  Recognition  in  the  early  1970’s; 
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most  of  the  papers  at  these  conferences  continue  to  deal  with  image  analysis.  Papers  on  both 
image  and  scene  analysis  also  appeared  in  the  Machine  Intelligence  Workshops,  initiated  in 
the  mid-1960’s,  and  the  International  Joint  Conferences  on  Artificial  Intelligence,  initiated 
at  the  end  of  the  1960’s. 

Conferences  on  image  processing  and  analysis  also  began  to  be  held  in  the  1960’s  [281, 
137,  39,  31,  158,  86,  153,  287,  187,  288,  26,  290,  291,  168,  294,  143,  254,  52,  239,  296, 
265,  177,  100,  303,  151,  305,  105,  76,  243,  250,  60].  A  series  of  workshops  on  Automatic 
Imagery  Pattern  Recognition  (initially:  Automatic  Photointerpretation  and  Recognition), 
initiated  in  1970,  is  still  ongoing.  The  annual  IEEE  Conferences  on  Pattern  Recognition  and 
Image  Processing  (later  retitled  Computer  Vision  and  Pattern  Recognition)  were  initiated 
in  1977,  as  were  the  annual  SPIE  Conferences  on  Applications  of  Digital  Image  Processing 
and  the  DARPA  Image  Understanding  Workshops  (initially  semiannual,  later  annual  and 
sesquennial). 

The  first  two  of  an  ongoing  series  of  bibliographies  on  image  processing  and  analysis 
appeared  in  1969  and  1973  [209,  212];  they  have  appeared  annually  since  then  [214],  and 
now  deal  only  with  image  analysis  and  computer  vision.  A  late  1970’s  survey  paper  is  [220]. 

The  first  book  on  image  processing  and  analysis  appeared  in  1969  [210];  a  book  on  pattern 
recognition  and  scene  analysis  appeared  in  1973  [48].  Other  books  published  in  the  1970’s 
dealt,  at  least  in  part,  with  image  or  scene  analysis  [252,  226,  83,  192,  271,  16,  198,  27, 

96,  272];  a  collection  of  important  reprints  is  [3].  During  the  1970’s  the  major  electrical 
engineering  and  computing  journals  published  special  issues  devoted  to  image  analysis  [106, 

97,  74]. 

The  first  journal  on  pattern  recognition  was  started  in  1968  [150];  the  first  journal  on 
artificial  intelligence,  in  1970  [167];  and  the  first  journal  on  computer  graphics  and  image 
processing  (including  image  analysis),  in  1972  [67].  The  first  IEEE  journal  in  the  field,  the 
Transactions  on  Pattern  Analysis  and  Machine  Intelligence,  was  started  in  1979;  many  other 
journals  were  started  during  the  1980’s  and  90’s.  [A  graphics  journal  was  started  in  1975, 
but  the  IEEE  Computer  Graphics  and  Applications  Magazine  was  not  started  until  1981, 
and  the  ACM  Transactions  on  Graphics  began  publication  only  in  1982.  Papers  on  image 
processing  appeared  in  signal  processing  journals  for  many  years;  the  IEEE  Transactions  on 
Image  Processing  was  not  started  until  1992.] 

A  conference  devoted  to  OCR  was  held  in  1962  [58],  and  a  Postal  Service  conference 
was  held  in  1969  [135];  work  in  the  USSR  is  described  in  [146].  Conferences  on  industrial 
automation  applications  did  not  begin  until  the  1970’s  [255,  257,  184,  46,  148,  34].  Early 
conferences  on  biomedical  image  analysis  dealt  primarily  with  microscope  images  [204,  249, 
51,  263,  130,  238,  302,  75,  309],  but  by  the  1970’s,  work  on  radiographs  was  also  being 
reported  [98,  111,  112,  199,  298,  306].  Later  publications  also  dealt  with  electron  microscopy 
[107, 108,  232],  and  there  were  also  specialized  conferences  on  topics  such  as  cytogenetics  [299, 
85].  There  were  Engineering  Foundation  Conferences  on  Automatic  Cytology  throughout 
the  1970’s,  and  a  journal  on  automated  cytology  was  started  in  1979  [262]. 

An  early  collection  of  papers  on  remote  sensing  is  [154];  an  early  automatic  target  recog¬ 
nition  system  is  described  in  [116,  115].  A  series  of  conferences  on  Machine  Processing  of 
Remotely  Sensed  Data  was  initiated  in  1973  [292].  Other  conferences  on  the  subject  were 
[297,  149,  304,  307,  264];  on  image  processing  in  astronomy  see  [44],  and  on  target  recogni- 
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tion  see  [69,  241,  9].  A  collection  of  important,  reprints  on  the  subject  is  [17],  and  a  book 
is  [245].  An  early  conference  on  forensic  applications  was  [277].  For  early  work  on  face 
recognition  see  [139,  81,  133].  Some  early  conferences  on  high-energy  physics  applications 
are  [282,  231,  285,  286,  41].  A  mid-1970’s  book  [217]  consisted  of  review  articles  on  the  major 
application  areas. 

Computer  architectures  appropriate  for  image  processing  and  analysis  have  been  an  issue 
of  interest  since  the  1950’s  [253].  A  workshop  on  the  subject  was  held  in  the  late  1970’s  [189], 
and  by  the  1980’s,  regular  conferences  on  the  subject  were  being  held.  Another  topic  which 
began  to  receive  attention  in  the  1970’s,  and  is  now  of  major  interest,  is  that  of  image  (and 
video)  databases  [301,  18]. 

7  Concluding  remarks 

Research  on  image  analysis  and  computer  vision  over  the  past  40  years  has  led  to  the  for¬ 
mulation  of  many  elegant  mathematical  models  and  algorithms.  Unfortunately,  most  vision 
problems,  even  those  that  were  first  tackled  in  the  1950’s,  are  mathematically  ill-defined 
(reading  handwritten  words,  counting  cells,  recognizing  buildings).  Real-world  visual  do¬ 
mains  do  not  satisfy  simple  mathematical  (even  probabilistic)  models.  Even  if  adequate 
scene  models  could  be  formulated,  problems  that  involve  inferring  information  about  a  scene 
from  images  are  often  mathematically  ill-posed  or  computationally  intractable;  but  the  pri¬ 
mary  reason  why  vision  is  hard  for  computers  is  that  the  scene  models  used  (often  tacitly) 
in  today’s  computer  vision  systems  are  unrealistic,  and  this  situation  is  likely  to  persist  for 
a  long  time  to  come. 

The  inadequacy  of  our  scene  models  does  not  imply  that  computer  vision  systems  will 
never  perform  adequately.  Animals  (and  humans)  use  vision  quite  effectively  in  the  real 
world.  A  possible  basis  for  this  is  that  biological  visual  systems  make  use  of  redundant 
visual  data  and  process  it  redundant  ways.  Computer  vision  systems  usually  avoid  such 
redundancy  in  order  to  reduce  computational  cost.  But  redundancy  may  allow  the  biological 
systems  to  detect  processing  errors,  since  they  are  likely  to  give  results  that  are  inconsistent 
or  non-persistent. 

Computer  vision  systems  are  just  reaching  the  levels  of  processing  power  that  will  allow 
them  to  handle,  in  real  time,  amounts  of  input  data  comparable  to  those  handled  by  biological 
visual  systems,  and  to  apply  multiple  processing  techniques  to  the  data.  The  techniques  used 
nowadays  in  these  systems  are  often  quite  simple,  even  “brute  force”;  but  more  complex 
algorithms,  which  today  can  only  be  demonstrated  in  the  laboratory,  will  run  at  video 
rates  on  tomorrow’s  processors.  As  processing  power  continues  to  increase,  some  of  these 
algorithms  will  be  applied  to  real-world  problems;  as  a  result,  the  performance  of  computer 
vision  systems  will  gradually  improve. 
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Figure  1:  An  “imagecentric”  view  of  data  processing:  The  relationship  between  image 
processing,  pattern  recognition,  and  computer  graphics. 
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Figure  2:  (a)  An  example  of  the  use  of  overstrike  on  an  alphanumeric  printer  to  generate 
“halftone”  images,  (b)  The  sets  of  overstruck  characters  used  to  produce  (a).  [From 
R.C.  Gonzalez  and  P.A.  Wintz,  Digital  Image  Processing,  Addison- Wesley,  Reading,  MA, 
1977.] 
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Figure  3:  A  general  image  analysis  paradigm. 
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Figure  4:  A  general  scene  analysis  paradigm  (for  a  single  static  image). 
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